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This article discusses the progressive learning for structural tolerance online 
sequential extreme learning machine (PSTOS-ELM). PSTOS-ELM can save 
robust accuracy while updating the new data and the new class data on the online 
training situation. The robustness accuracy arises from using the householder 
block exact QR decomposition recursive least squares (HBQRD-RLS) of the 
PSTOS-ELM. This method is suitable for applications that have data 
streaming and often have new class data. Our experiment compares the 
PSTOS-ELM accuracy and accuracy robustness while data is updating with 
the batch-extreme learning machine (ELM) and structural tolerance online 
sequential extreme learning machine (STOS-ELM) that both must retrain the 
data in a new class data case. The experimental results show that PSTOS-ELM 
has accuracy and robustness comparable to ELM and STOS-ELM while also 
can update new class data immediately. 
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1. INTRODUCTION 

The extreme learning machine (ELM) is a fast data training algorithm with the single-hidden layer 
feed-forward networks (SLFNs) structure proposed [1]-[3]. In addition, the ELM has the least-squares solution to 
lead to satisfactory accuracy. However, the ELM relies on a batch-mode learning mechanism. When batch-mode 
ELM trains new data, batch-mode ELM has to retrain all of the data in the dataset, including trained data and the 
new coming data. For example, ELM recognized 1,000 samples. And there is the new samples 10 samples for 
training. ELM must train 1,010 samples (all samples) to update their knowledge. It will always happen when 
new data updating. Therefore, this convention requires multiple times to retrain the data. 

Online learning [4] is a method of machine learning that can update the training data in sequential order. 
However, the batch learning methods can create the machine learning model by learning here on the entire training 
data set at once. That online learning can take machine learning to learn from new data close to real-time and is 
used in many applications such as intrusion detection [5] and facial expression recognition [6], [7]. 

In the case of ELM with online learning, Liang et al. [8] proposed the online sequential ELM (OS-ELM) 
that enables online data update capability with comparable accuracy to ELM. The online update capability brings 
OS-ELM can update the new training data without retraining the trained data. However, OS-ELM may have a 
loss of information problem when updating the data continuously. When the OS-ELM has completed the new 
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data updating process of each training, the weight and bias matrix of OS-ELM are stored and updated in the 
memory. By the memory limitation, the memory has to finite the long decimal places by rounding that may 
affect the weight and bias loss information. This situation is called a round-off error. The round-off error will 
accumulate depending on the number of updating data. Therefore, The error in the data update process has to 
be determined to reduce the OS-ELM generalization [9]. 

Horata et al. [10] proposed structural tolerance OS-ELM (STOS-ELM), which is the applied OS-ELM 
to handle the OS-ELM robustness problem while updating the data. STOS-ELM works with the householder 
block exact QR decomposition recursive least squares (HBQRD-RLS) to enable online learning. Besides, 
HBQRD-RLS can reduce the effect of the rounded-off error and improve robustness in the STOS-ELM online 
update situation [10]-[12]. 

OS-ELM and STOS-ELM are matched for applications with the dataset in that they have to learn a 
new class of data by retraining all of the data. It might not support the applications that work with real-time 
arrived data where the nature of training data is unknown. Therefore, a learning technique must be adapted to 
this situation [9]. 

The progressive learning has inspiration from the human learning paradigm. Human learning can 
continue to learn whenever a new phenomenon has encountered. Human can resume, adapt, and grow to learn 
the phenomenon while still keeping existing knowledge learned thus far. Progressive learning is used in many 
applications such as vegetable disease recognition [13] and COVID-19 diagnosis [14]. 

Venkatesan and Er [15] proposed the progressive learning technique for OS ELM (POS-ELM) is OS-ELM 
worked with progressive learning technique. This technique can support new class data OS-ELM learning by 
retaining the knowledge of previous class data. It is like human learning theory. POS-ELM can break the number of 
class constraints of the OS-ELM in the data updating process. However, POS-ELM still has the round-off error as 
same as OS-ELM. 

This article discusses the problems and proposes progressive learning for structural tolerance OS-ELM 
(PSTOS-ELM), which aims to improve the robustness of STOS-ELM in a new class data updating situation. Our 
experiment shows that PSTOS-ELM has higher accuracy and more robustness than POS-ELM. That effect from 
PSTOS-ELM has a progressive learning technique. 

This article is structured as: materials and methods section describes the details of ELM, OS-ELM, 
STOS-ELM, and the proposed algorithm PSTOS-ELM. The experiment section explains the experimental 
details and results. And the last section is the conclusion. 


2. METHOD 
2.1. Extreme learning machine 

The ELM is a fast data training algorithm [1], [2]. Given ELM have K hidden nodes with the D-dimension 
input. The samples could be formatted as (x;,t;), i = 1,2, ..., N where x; € RP is the training sample members 
of Xyxp and t; E R“ is the target sample members of Tyxc by C is the number of classes. ELM can be expressed 
in least squares form: 


Prxc = Bh ails (1) 


Where Pkxc = [Bia Bijz- Biel i = 1,2,..., K isa matrix of output weights. Ht is the pseudo-inverse 
of H that can calculate from the Moore-Penrose formulation. The ELM’s input formulation is a linear equation 
that has the following description: 


H = g(XW +B) = [hi] = [g(x w; + bp] (2) 


Where g(-) is activation function and input weights Wpxg = [Wij Wajr-e Wp] j = 1,2,...,K and 
biases Byyxx = [b;,bj,.--, bj)" .j = 1,2,...,K be randomly generated in the range [0,1] and [-1,1], respectively. 


2.2. Online sequential extreme learning machine 

An OS-ELM [8] can update its knowledge by training exclusively on new data. The OS-ELM 
calculation can be summarized. In case of the number of hidden nodes K is less than or equal to the number of 
samples N(N,_; + Nx) samples, the previously trained samples (X;~1(wy_, xp) Tk-1(Wy_1xc)) Ne-1 Samples 
and newly delivered samples (Xx,w,,xp)» Tkvyxc)) Nx Samples, the output weights y are formulate. 


Bae = Kg? [Ms] e 3) 


Hk Tk 
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Where k is an index of newly delivered data and k — 1 is the index of previously trained data. 


Ke = paj ES (4) 


Ax Ax 


For sequential learning, in (4) can be rewritten as the conditions of K;,_, as: 

Ky = Kye + (Hk Ax) (5) 
Thus, the result of the combination of (3) and (5) is: 

Br = Br-1 + Ke He Te — A Bu—1) (6) 
In (5) can be rewritten using the Woodbury formula [16] to calculate K;?: 

Py = Pk-1 — Pk-1Hg (I + He Pret) Hie Pet (7) 
Where P, = K,*. 


2.3. Structural tolerance sequential extreme learning machine 

STOS-ELM [10] allows robust sequential learning to ELM by using the HBQRD-RLS to store and 
update the square root factor covariance matrix (R21) for the output weight (Bp) updating. When new data is 
delivered, STOS-ELM calculation steps are described. Initial phase, beginning with the calculation of Ñz% at 
time k = 1 where R5! ~ HÏ by: 


Rot = Ro'Q" (8) 
From Q7 HB = Q'T by QR = H, this process is called QR decomposition. Q is an orthogonal matrix 
that have property Q7Q = Iyxy where Q € RN*™ and R is N x K upper triangular matrix that has the same 


values as the square root of HTH or RTR = HTH. As a result, in (8) can be solved in the triangular system [17]. 
That the initial output weight (6) can be calculated by: 


Bo = Ro To (9) 
Sequential phase, STOS-ELM uses the HBQRD-RLS to produce the relational matrix: 
Gy = — RA He (10) 


Where Gp is the relational matrix between newly delivered data and previously trained data. 


The next step is to store and update the square root factor covariance matrix z+ for F;,, and Ef producing. 
By applying Lemma 1 in [18], HBQRD-RLS was working based on householder transformation [19], [20], 
to produce an orthogonal matrix U(k) such that. 


U(k) (11) 


IN, XxNy On, xK e oe EK(NgXK) 
r| = 


5-1 n— 
Gr(Noxny) CRe-1~Woxky) Onoxn,  CRidvoxy)” 


Where / is the identity matrix and 0 is the zero matrix. The last step is to update the new output weight fx: 
Br = Br-1 + Ex (Fie )" Te — He Br-1) (12) 
Where (Tẹ — Hy Bx-1), Fk and EF are called Kalman gain. 


2.4. Progressive structural tolerance online sequential extreme learning machine (the proposed method) 

PSTOS-ELM is STOS-ELM that can learn a new class by maintaining the old knowledge (trained class). 
In the training phase, while the P new class data is coming to PSTOS-ELM, the output weight matrix at k — 1 
time £;,_, is recalibrated to support the new class by using a recalibrated matrix Af, that can be written. 


Br-1 < —[Be-1 Afk] (13) 


The recalibrated matrix (Af;,) is the output weight matrix value of the not trained class that is 
calculated by multiple of the square root factor covariance matrix of the newly delivered data (z+). 
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—1 —1 
ABy = Rucexn,) | : E : | (14) 
—1 vee =] N,XxP 


Where Rz! = R;,1Q7 that Q is Q of Rin QR decomposition. PSTOS-ELM is summarized in Algorithm 1. 


Algorithm 1. PSTOS-ELM algorithm 
Initial phase 
1. Define parameters in K hidden nodes and set k = 1. 
2. Calculate initial data X;,_; to the hidden layer output matrix H,_, by using (2). 
3. Initial Rz ~ Ae and corresponding solution By, = Rg QTTk-1- 
Sequential phase 
4. When a new sample set X; come to the system, Xx has calculated to H, by (2). 
5. If new classes are introduced, update 6,_, by using (13). 
6. Calculate Gg as in (10). 
7. Store and update Rz*, F, and Ef by using (11). 
8. Update the output weights £x at the k time of the new data is coming as in (12). 
9. Plus k to 1 when the new sample set is coming to the training process, and then go to step (4). 


End 


3. RESULTS AND DISCUSSION 
3.1. Experimental setup 

This section is the experimental setup that will describe the details of the datasets and how to prepare 
them, start with the dataset. The six datasets are from the University of Irvine, California (UCI) [21] repository 
and were selected the same as Venkatesan and Er’s experiment [15]. The dataset details are shown in Table 1. 


Table 1. The dataset 


Dataset A number of classes A number of features 
Tris 3 4 
Balance 3 4 
Waveform 3 32 
Wine 3 13 
Satellite 5 36 
Optdigits 10 63 


The datasets are used to evaluate the the performance of the methods. All methods are run in 
MATLAB version R2014a on a computer with the environment Core i3 3.40 GHz RAM 8.00 GB. Training 
and testing data preparation in our experiments can be described. 

a) Separate the data of each class into two groups: 70 percent and 30 percent are training data and testing 
data, respectively [22]. 

b) Sort the training data by the number of the class in ascending order. 

c) Define the two first classes of data to the initial data and the remaining class data to the sequential data. 

This process is used to validate the performance in a new class data update situation that depends on the 
random input weights and biases with ten rounds. For each round, the input weights and biases will be generated 
in one set for all methods. The numbers of the hidden nodes of all methods are varied in the range of [1,200]. 


3.2. Results 
3.2.1. Accuracy of PSTOS-ELM 

This section reports the performance result of ELM, OS-ELM, POS-ELM [15], STOS-ELM, and 
PSTOS-ELM. The five methods are evaluated by their performance by using the following: the average accuracy 
of the 10-round test, max accuracy of the 10-round test, min accuracy of the 10-round test, standard deviation 
(SD) of the 10-round test, and the number of the hidden node that take the best accuracy to the methods as shown 
in Table 2. The bold letters show the best value of each dataset (the meta-metrics evaluation [23]-[24]). 

Table 2 shows the performance of the methods over the six datasets. The results show that PSTOS-ELM 
has average accuracy, max accuracy, and min accuracy slightly lower than STOS-ELM, which has the highest 
accuracy on average accuracy. The difference between the average, max, and min accuracy of STOS-ELM and 
PSTOS-ELM are 0.0017, 0.0019, and 0.0033, respectively. 
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Table 2. The performance of methods 


Dataset Method Accuracy Max Min SD A number of node 

Balance ELM 0.9143 0.9206 0.9101 0.0033 15 
OS-ELM 0.9143 0.9206 0.9101 0.0033 15 

STOS-ELM 0.9143 0.9206 0.9101 0.0033 15 

POS-ELM 0.8360 0.8889 0.7831 0.0402 58 

PSTOS-ELM 0.9122 0.9206 0.8995 0.0062 17 

Tris ELM 0.9667 0.9778 0.9333 0.0157 21 
OS-ELM 0.9622 1.0000 0.9111 0.0235 105 

STOS-ELM 0.9689 0.9778 0.9333 0.0155 21 
POS-ELM 0.9622 0.9778 0.9111 0.0211 104 

PSTOS-ELM 0.9644 0.9778 0.9333 0.0187 21 
Optdigits ELM 0.8200 0.8215 0.8186 0.0009 195 
OS-ELM 0.5785 0.6525 0.5201 0.0390 28 
STOS-ELM 0.8199 0.8209 0.8186 0.0008 194 
POS-ELM 0.3471 0.5437 0.2417 0.0881 118 
PSTOS-ELM 0.8198 0.8209 0.8186 0.0007 194 
Satellite ELM 0.7342 0.7378 0.7311 0.0020 200 
OS-ELM 0.5251 0.5959 0.4514 0.0569 89 
STOS-ELM 0.7342 0.7378 0.7311 0.0020 200 

POS-ELM 0.4771 0.6054 0.2912 0.1181 86 
PSTOS-ELM 0.7342 0.7372 0.7311 0.0020 200 
Waveform ELM 0.8580 0.8594 0.8568 0.0009 190 
OS-ELM 0.8211 0.8354 0.7808 0.0203 184 
STOS-ELM 0.8578 0.8594 0.8568 0.0008 190 
POS-ELM 0.7783 0.8315 0.6749 0.0524 147 
PSTOS-ELM 0.8577 0.8588 0.8561 0.0008 190 

Wine ELM 0.9836 1.0000 0.9818 0.0057 42 
OS-ELM 0.9855 1.0000 0.9818 0.0077 42 

STOS-ELM 0.9855 1.0000 0.9818 0.0077 42 

POS-ELM 0.9818 0.9818 0.9818 0.0000 28 

PSTOS-ELM 0.9818 0.9818 0.9818 0.0000 12 

Average accuracy ELM 0.8795 0.8862 0.8719 0.0040 - 

OS-ELM 0.7978 0.8341 0.7592 0.0258 — 


STOS-ELM 0.8801 0.8861 0.8719 0.0040 - 
POS-ELM 0.7304 0.8032 0.6473 0.0505 - 
PSTOS-ELM 0.8784 0.8828 0.8701 0.0050 - 


3.2.2. Robustness with the best-hidden node when data updating 

This experiment simulates randomization of weight and bias situations that aims to analyze the 
robustness and accuracy of the methods on a 10-rounds test with different weights and biases in data updating 
situations. The methods use the best-hidden node that is shown in Table 2. Figure 1(a) to Figure 1(e), Figure 2(a) 
to Figure 2(e), and Figure 3(a) to Figure 3(e) show the average accuracy (red line), max accuracy (green line), 
and min accuracy (blue line) of ELM, OS-ELM, STOS-ELM, POS-ELM, and PSTOS-ELM with the appropriate 
number of the hidden node. Each figure demonstrates the accuracy of three datasets: optdigits, satellite, and 
balance. The x-axis in the figure shows the percent of data updating that starts with data in the third class of 
the dataset. If the dataset has more than three classes, the red line will have appeared in the percent that the 
new class data has trained. 

From Figure 1 to Figure 3, the results can be divided into 2 groups as follows. In the first group (Figure 1 
to Figure 2), ELM (Figure 1 to Figure 2(a)), STOS-ELM (Figure 1 to Figure 2(c)), and PSTOS-ELM (Figure 1 to 
Figure 2(e)) have accuracy that tends to grow with robustly. POS-ELM (Figure 1 to Figure 2(d)) has an accuracy 
trend to grow, but POS-ELM has less accuracy and robustness than ELM. For OS-ELM (Figure 1 to Figure 2(b)), 
the accuracy trend is like POS-ELM, but the min accuracy trend is lower than. Optdigits, satellite, and waveform 
datasets are in the first group. In the second group (Figure 3), all of the methods have the same growth trends. 
Balance, iris, and wine are in the second group. 

The extra case of the satellite dataset has one different point from the first group (Figure 2). After OS-ELM 
updates the data on the fifth class to OS-ELM (Figure 2(b)), OS-ELM has a down accuracy trend. On the other hand, 
POS-ELM (Figure 2(d)) has an up-accuracy trend. However, OS-ELM has accuracy comparable to POS-ELM. 


3.2.3. Robustness over a different number of nodes when data updating 

This experiment simulates the accuracy of each hidden node. The experiment aims to analyze the 
robust and accuracy of the methods in the different numbers of hidden nodes. Figure 4 to Figure 6 show the 
accuracy of ELM, OS-ELM, STOS-ELM, POS-ELM, and PSTOS-ELM in each percent of updated data 
(x-axis) with the different numbers of hidden nodes. Each figure demonstrates the accuracy in three datasets: 
satellite, iris, and wine. 
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From Figure 4(a) to Figure 6(e), the results can be seperated into three groups as follows. In the first 
group (Figure 4), ELM (Figure 4(a)), STOS-ELM (Figure 4(c)), and PSTOS-ELM (Figure 4(e)) have accuracy 
that tends to grow in all hidden nodes. On the other hand, OS-ELM (Figure 4(b)) and POS-ELM (Figure 4(d)) 
accuracy trend up slightly and then drop in some hidden nodes. Satellite, balance, optdigits, and waveform 
datasets are in the first group. 

In the second group (Figure 5), STOS-ELM (Figure 5(c)) and PSTOS-ELM (Figure 5(e)) have an 
accuracy trend to grow in all hidden nodes. On the other hand, ELM (Figure 5(a)), OS-ELM (Figure 5(b)), and 
POS-ELM (Figure 5(d)) accuracy trend up slightly and then drop in some hidden nodes. Iris dataset is in the 
second group. 

In the third group (Figure 6), OS-ELM (Figure 6(b)), POS-ELM (Figure 6(d)), STOS-ELM (Figure 6(c)), 
and PSTOS-ELM (Figure 6(e)) have an accuracy trend to grow in all hidden nodes. Only ELM (Figure 6(a)) has an 
accuracy trend that grows slightly, drops sharply, and then rises at 100 hidden nodes. And at 150 and 200 hidden 
nodes, ELM trend accuracy is less grower than the other method. Iris dataset is in the third group. 

As seen in Figure 4 to Figure 6, accuracy in some hidden nodes has a downtrend. It noticed how final 
accuracy has robustness in a wide range of the number of hidden nodes. The notice will be issued to find in the 
next section. 
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Figure 1. The accuracy of the methods while data updating with their best hidden node in optdigits dataset: 
(a) ELM, (b) OS-ELM, (c) STOS-ELM, (d) POS-ELM, and (e) PSTOS-ELM 
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Figure 2. The accuracy of the methods while data updating with their best hidden node in satellite dataset: 
(a) ELM, (b) OS-ELM, (c) STOS-ELM, (d) POS-ELM, and (e) PSTOS-ELM 
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Figure 3. The accuracy of the methods while data updating with their best hidden node in balance dataset: 
(a) ELM, (b) OS-ELM, (c) STOS-ELM, (d) POS-ELM, and (e) PSTOS-ELM 
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Figure 4. Accuracy of the methods with different number of hidden nodes in satellite dataset: (a) ELM, 
(b) OS-ELM, (c) STOS-ELM, (d) POS-ELM, and (e) PSTOS-ELM 
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Figure 5. Accuracy of the methods with different number of hidden nodes in iris dataset: (a) ELM, 
(b) OS-ELM, (c) STOS-ELM, (d) POS-ELM, and (e) PSTOS-ELM 
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Figure 6. Accuracy of the methods with different number of hidden nodes in wine dataset: (a) ELM, 
(b) OS-ELM, (c) STOS-ELM, (d) POS-ELM, and (e) PSTOS-ELM 
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3.2.4. Robustness when the number of hidden nodes varied in a wide range 

This experiment aims to extend more information from the previous section by analysing the 
accuracies of the methods of hidden nodes in the range of 1 to 200 hidden nodes. The result is shown in 
Figure 7. Our dataset selection according to 3 groups in section 3.2.3 that selected dataset for each group is the 
waveform, iris, and wine dataset, respectively. 


ELM 
OSELM 
STOS-ELM 


FOS-ELM 
FSTOS-ELM 


0 10 20 


(a) (b) (c) 


Figure 7. The accuracies of the methods while the number of hidden nodes varied in the range of 1 to 200 
hidden nodes: (a) waveform dataset, (b) iris dataset, and (c) wine dataset 


For group 1 in Figure 7(a), ELM, STOS-ELM, and PSTOS-ELM have accuracy that tends to grow and 
be stable. OS-ELM and POS-ELM have accuracies that trend lower than ELM, STOS-ELM, and PSTOS-ELM. 
In addition, OS-ELM and POS-ELM have fluctuating accuracy trends. 

For group 2 in Figure 7(b), ELM, STOS-ELM, and PSTOS-ELM have a similar accuracy trend. But ELM 
has low accuracy trend than STOS-ELM and PSTOS-ELM in between 80 to 200 hidden nodes. For OS-ELM and 
POS-ELM, their accuracy trends to drop sharply between 10 to 20 hidden nodes. After that the accuracy trends to 
go up to their accuracy is higher than ELM, OS-ELM, and STOS-ELM since above 80 hidden nodes. 
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For group 3 in Figure 7(c), all methods have similar accuracy trend. But OS-ELM and POS-ELM 
have accuracy that tends to drop and up between 83-100 hidden nodes. And ELM has lower accuracy trend 
than other methods since above 100 hidden nodes. From the result, if the number of hidden nodes has too much, 
overfitting will happen that the decision boundary is too fit with the data. On the opposite, the model will not 
match the complex data. 

Therefore, selecting the appropriate number of hidden nodes is required for the best performance of ELM. 
However, the number of hidden node selections does not have the principle. So, the model validation has been used. 
The method can help ELM find the appropriate node by testing the ELM with the validation data (the data separate 
from the training data) and selecting the best number of hidden nodes that takes the best accuracy. 


3.2.5. Performance comparison between PSTOS-ELM and the other ELMs with progressive learning 

In this section, our experiment aims to compare the performance of PSTOS-ELM and the other ELM 
with progressive learning. The other ELMs are PL-MCCIP, PL-MCCRP, PL-MCCMP [25], POS-ELM [15], 
and S-ELM [26]. The performance comparison consists of the accuracy and robustness of all ELM methods 
are shown in Table 3. 

The first line in the table shows the best accuracy of the ELM methods with the best-hidden node in 
the range [1,200]. The second line is the area under the curve of each accuracy from the data updating that the 
area under the curve can calculate from the trapz function in MATLAB. The bold letters show the best value 
of each dataset. 


Table 3. The performance of PSTOS-ELM and the other ELMs with progressive learning 
Dataset PL-MCCp_ PL-MCCrer — PL-MCCyp POS-ELM S-ELM PSTOS-ELM 


Balance 0.8571 0.4762 0.8571 0.8571 0.9206 0.9206 
169.4339 99.8386 169.4339 169.5556 170.8360 169.0212 

Tris 0.9778 0.9333 0.9778 0.9556 1.0000 0.9778 
31.1111 21.6000 31.1111 31.5000 33.4556 32.0000 

Optdigits 0.3966 0.1537 0.5875 0.3972 0.6200 0.8191 
1190.2444 438.2402 1117.4557 1191.2878 1434.3735 1701.3286 

Satellite 0.4838 0.4196 0.5250 0.4838 0.5257 0.7351 
968.5703 422.8419 973.9280 968.5439 837.5622 1246.4811 

Waveform 0.8314 0.7442 0.8314 0.8314 0.8388 0.8588 
928.6935 735.6985 928.6935 928.5456 891.3468 927.2332 

Wine 1.0000 0.3091 1.0000 0.9818 1.0000 1.0000 

29.8909 10.8818 29.8909 32.2727 32.5727 32.5727 

Average 0.7578 0.5060 0.7965 0.7512 0.8175 0.8852 


552.9907 288.1835 541.7522 553.6176 566.6911 684.7728 


Table 3 shows that PSTOS-ELM has the highest average accuracy and the area under the curve. The result 
has some points of interest. As clearly seen, the other ELMs with progressive learning have low accuracy and area 
under the curve on optdigits and satellite datasets. Both datasets have number of class higher than three classes which 
may cause low accuracy. However, that problem does not affect STOS-ELM. 


3.3. Discussion 
This article presents PSTOS-ELM that can improve robust accuracy while updating the new data and the 
new class data on the online training situation. The robustness accuracy arises from using the HBQRD-RLS. 

HBQRD-RLS is supported by PSTOS-ELM performance as shown in the experimental results in 3 aspects. 

a) Accuracy and robustness: PSTOS-ELM has accuracy comparable to the batch ELM and STOS-ELM. 
Furthermore, PSTOS-ELM also keeps the robust property in hidden node changing and data updating 
situations. That is creditable to the key of PSTOS-ELM in using the HBQRD-RLS. 

b) Effect of progressive learning: the other ELMs with progressive learning cannot achieve robustness, 
especially in the dataset that has several classes. On the other hand, PSTOS-ELM almost has similar 
accuracy to STOS-ELM. That means PSTOS-ELM does not affect progressive learning. 

c) Computation: while new class samples come, STOS-ELM must recalculate the initial model by setting 
Ho: to the recent samples include the new class samples and using R5.;Tp-x (9) to calculate Bo.,_1- 
The samples updating complexity of STOS-ELM is K Xx No. X No.. X C And PSTOS-ELM uses 

= re | 


AB, = R+ (14) to update the new class samples and uses (12) to concatenate this 


-1 + -Uny,xp 
with the old beta 6,_,. The data updating complexity of PSTOS-ELM is K x Ng X N, X P + little 
calculation for B,,; concatenating (12). The massive difference in updating complexity between STOS-ELM 
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4. 


in (9) and PSTOS-ELM (14) is the size of the class (C and P) and samples (No., and Nọ) that PSTOS-ELM 
uses only the new coming samples for training. Therefore, PSTOS-ELM uses computation less than 
STOS-ELM in new class data updating. 


CONCLUSION 
This article discussed the PSTOS-ELM based on the HBQRD-RLS algorithm. PSTOS-ELM can retain 


robust accuracy while updating the new data and the new class data on the online training situation. The results 
showed that PSTOS-ELM accuracy and robustness are comparable to the batch learning ELM and STOS-ELM. 
Furthermore, PSTOS-ELM can reduce the complexity of STOS-ELM when updating the new class data. 


REFERENCES 


[1] 
[2] 
[3] 


[4] 
[5] 
[6] 


[7] 


[8] 


[9] 
[10] 
[11] 


[12] 


[13] 


[14 


[15 
[16] 
[17 
[18 
[19 
[20 


[21 
[22 


[23 
[24 


[25 


[26 


G. -B. Huang, Q. -Y. Zhu, and C. -K. Siew, “Extreme learning machine: theory and applications,” Neurocomputing, vol. 70, no. 1-3, 
pp. 489-501, 2006, doi: 10.1016/j.neucom.2005.12.126. 

G. -B. Huang, D. H. Wang, and Y. Lan, “Extreme learning machines: a survey,” International journal of machine learning and 
cybernetics, vol. 2, pp. 107—122, 2011, doi: 10.1007/s13042-011-0019-y. 

G. -B. Huang, H. Zhou, X. Ding, and R. Zhang, “Extreme Learning Machine for Regression and Multiclass Classification,” in IEEE 
Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 42, no. 2, pp. 513-529, 2012, 
doi: 10.1109/TSMCB.2011.2168604. 

Ó. F. -Romero, B. G. -Berdifias, D. M. -Rego, B. P. -Sanchez, and D. P. -Barral, “Online machine learning,” in Efficiency and 
Scalability Methods for Computational Intellect, IGI Global, 2013, pp. 27—54, doi: 10.4018/978-1-4666-3942-3.ch002. 

W. -P. Cao et al., “An ensemble fuzziness-based online sequential learning approach and its application,” International Conference on 
Knowledge Science, Engineering and Management, 2021, pp. 255-267, doi: 10.1007/978-3-030-82136-4_21. 

A. Uçar, Y. Demir, and C. Güzeliş, “A new facial expression recognition based on curvelet transform and online sequential extreme 
learning machine initialized with spherical clustering,” Neural Computing and Applications, vol. 27, pp. 131—142, 2016, 
doi: 10.1007/s00521-014-1569-1. 

S. Atsawaraungsuk, T. Katanyukul, P. Polpinit, and N. E. -Anant, “Fast and robust online-learning facial expression recognition and 
innate novelty detection capability of extreme learning algorithms,” Progress in Artificial Intelligence, vol. 11, pp. 151—168, 2022, 
doi: 10.1007/s13748-021-00266-y. 

N. -Y. Liang, G. -B. Huang, P. Saratchandran, and N. Sundararajan, “A Fast and Accurate Online Sequential Learning Algorithm 
for Feedforward Networks,” in JEEE Transactions on Neural Networks, vol. 17, no. 6, pp. 1411-1423, 2006, 
doi: 10.1109/TNN.2006.880583. 

M. Moonen and J. Vandewalle, “Recursive least squares with stabilized inverse factorization,” Signal Processing, vol. 21, no. 1, 
pp. 1-15, 1990, doi: 10.1016/0165-1684(90)90022-Q. 

P. Horata, S. Chiewchanwattana, and K. Sunat, “Enhancement of online sequential extreme learning machine based on the householder 
block exact inverse QRD recursive least squares,” Neurocomputing, vol. 149, pp. 239-252, 2015, doi: 10.1016/j.neucom.2013.10.047. 
S. Atsawaraungsuk and T. Katanyukul, “Sin activation structural tolerance of online sequential circular extreme learning machine,” 
International Journal of Technology, vol. 8, no. 4, 2017, doi: 10.147 16/ijtech.v8i4.9476. 

S. Atsawaraungsuk, T. Katanyukul, and P. Polpinit, “Identity activation structural tolerance online sequential circular extreme 
learning machine for highly dimensional data,” Engineering and Applied Science Research, vol. 46, no. 2, pp. 120-129, 2019, 
doi: 10.14456/easr.2019.15. 

J. Zhou, J. Li, C. Wang, H. Wu, C. Zhao, and Q. Wang, “A vegetable disease recognition model for complex background based on region 
proposal and progressive learning,” Computers and Electronics in Agriculture, vol. 184, 2021, doi: 10.1016/j.compag.2021.106101. 

Z. Yang, Y. Hou, Z. Chen, L. Zhang, and J. Chen, “A Multi-Stage Progressive Learning Strategy for Covid-19 Diagnosis Using 
Chest Computed Tomography with Imbalanced Data,” ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech 
and Signal Processing (ICASSP), 2021, pp. 8578-8582, doi: 10.1109/ICASSP39728.2021.9414745. 

R. Venkatesan and M. J. Er, “A novel progressive learning technique for multi-class classification,” Neurocomputing, vol. 207, 
pp. 310-321, 2016, doi: 10.48550/arXiv.1609.00085. 

M.A. Akgiin, J. H. Garcelon, and R. T. Haftka, “Fast exact linear and non-linear structural reanalysis and the Sherman—Morrison—Woodbury 
formulas,” International Journal for Numerical Methods in Engineering, vol. 50, no. 7, pp. 1587—1606, 2001, doi: 10.1002/nme.87. 

M. Moonen and J. Vandewalle, “A square root covariance algorithm for constrained recursive least squares estimation,” Journal of 
VLSI signal processing systems for signal, image and video technology, vol. 3, pp. 163—172, 1991, doi: 10.1007/BF00925827. 

J. A. Apolinário, “QRD-RLS adaptive filtering,” New York: Springer, 2009, doi: 10.1007/978-0-387-09734-3. 

G. H. Golub and C. F. V. Loan, “Orthogonalization and Least Squares,” in Matrix computations, 3rd ed, Baltimore, MD, USA: The 
Johns Hopkins University Press, 2013, vol. 3. [Online]. Available: 
https://twiki.cern.ch/twiki/pub/Main/AVFedotovHowToRootTDecompQRH/Golub_VanLoan.Matr_comp_3ed.pdf 

C. -T. Pan and R. Plemmons, “Least squares modifications with inverse factorizations: parallel implications,” in Journal of 
Computational and Applied Mathematics, vol. 27, no. 1-2, pp. 109—127, 1989, doi: 10.1016/0377-0427(89)90363-4. 

A. Asuncion and D. Newman, “UCI machine learning repository,” 2007. [Online]. Available: http://archive.ics.uci.edu/ 

A. Gholamy, V. Kreinovich, and O. Kosheleva, “Why 70/30 or 80/20 relation between training and testing sets: a pedagogical 
explanation,” International Journal of Intelligent Technologies and Applied Statistics, 2018, doi: 10.6148/IJITAS.201806_1 1(2).0003. 
A. Stefani and M. Xenos, “Meta-metric evaluation of e-commerce-related metrics,” Electronic Notes in Theoretical Computer 
Science, vol. 233, no. 27, pp. 59-72, 2009, doi: 10.1016/j.entcs.2009.02.061. 

P. Horata, S. Chiewchanwattana, and K. Sunat, “Robust extreme learning machine,” Neurocomputing, vol. 102, pp. 31—44, 2013, 
doi: 10.1016/j.neucom.2011.12.045. 

M. J. Er, R. Venkatesan, N. Wang, and C. -J. Chien, “Progressive learning strategies for multi-class classification,” 2017 
International Automatic Control Conference (CACS), 2017, pp. 1-6, doi: 10.1109/CACS.2017.8284266. 

C. -L. Lee, Y. -T. Chen, and A. -Y. Wu, “A Scalable Extreme Learning Machine (S-ELM) for Class-Incremental ECG-Based User 
Identification,” 2021 JEEE International Symposium on Circuits and Systems (ISCAS), 2021, pp. 1-5, 
doi: 10.1109/ISCAS51556.2021.9401716. 


A progressive learning for structural tolerance online sequential extreme ... (Sarutte Atsawaraungsuk) 


1050 O ISSN: 1693-6930 
BIOGRAPHIES OF AUTHORS 


Sarutte Atsawaraungsuk © FJ BE © received the B.Sc. and M.Sc degree in Computer 
Science, and the Ph.D. degree in Computer Engineering from the Khon Kaen University, 
Thailand. Currently, he is an assistant professor at the Department of Computer Education, 
Faculty of Education, Udon Thani Rajabhat University, Udon Thani, Thailand. His research 
interests in the Machine learning, image processing and its application. He can be contacted 
at email: sarutte @udru.ac.th. 


Wasaya Boonphairote O EYES © obtained a bachelor of arts in English from National and 
Kapodistrian University of Athens, Greece and received a master of arts in Career English 
For International Communication from Thammasat University, Thailand. Currently, She is a 
lecturer at Language Center, Udon Thani Rajabhat University. The research interests focus 
on modern greek language, natural language processing and spoken language systems. She 
can be contacted at email: wasaya.bo @udru.ac.th. 


Kritsanapong Somsuk © EYES © is an associate professor at the Department of Computer 
and Communication Engineering, Faculty of Technology, Udon Thani Rajabhat University, 
Udon Thani, Thailand. He obtained his M.Eng. (Computer Engineering) from Department of 
Computer Engineering, Faculty of Engineering, Khon Kaen University, M.Sc. (Computer 
Science) from Department of Computer Science, Faculty of Science, Khon Kaen University 
and his Ph.D. (Computer Engineering) from Department of Computer Engineering, Faculty 
of Engineering, Khon Kaen University. The area of research interests includes computer 
security, cryptography and integer factorization algorithms. He can be contacted at email: 
kritsanapong @udru.ac.th. 


Chanwit Suwannapong © E I> received a B.Eng., M.Eng. and Ph.D. degree in 
Computer Engineering from Khon Kaen University, Thailand. Currently, he is an assistant 
professor at the Department of Computer Engineering, Faculty of Engineering, Nakhon 
Phanom University, Nakhon Phanom, Thailand. He has published many publications in the 
area of Wireless Sensor Network, Ad Hoc Networks and Smart Agriculture. He can be 
contacted at email: schanwit @npu.ac.th. 


Suchart Khummanee Ô £4 EF © received the B.Eng. degree in Computer Engineering 
from the King Mongkut’s Institute of Technology Ladkrabang, the M.Sc. degree in Computer 
Science from the Khon Kaen University, and the Ph.D. degree in Computer Engineering from 
the Khon Kaen University, Thailand. He is currently a full lecturer of Computer Science at 
the Mahasarakham University, Thailand. He can be contacted at email: suchart.k @msu.ac.th. 


TELKOMNIKA Telecommun Comput El Control, Vol. 21, No. 5, October 2023: 1039-1050 


